Biomedical Association Discovery via Complementary TDM
نویسندگان
چکیده
Ever-growing textual data make it increasingly difficult to effectively utilize all the information relevant to our interests. For example, Medline—a comprehensive bibliographic database in life science—currently contains over 16 million articles, and the number is rising rapidly by 2,000–4,000 per day. Given the substantial volume of information, it is crucial to develop intelligent information processing techniques, namely, text data mining (TDM), that could help us manage the information overload. While its importance is clear, the definition of TDM has been somewhat vague in use within the related research communities. In a broader sense, it refers to every text processing equipped with some intelligence, such as information extraction, text categorization, and summarization, whereas in a more restricted sense of TDM, it refers only to the discovery of heretofore unknown knowledge [1]. The most significant difference between the two is that the former focuses on the information explicitly stated in text, and the latter on implicit knowledge which could be discovered by synthesizing fragments of information extracted from a large volume of textual data. Historically, these two types of TDM have been tackled in rather separated research communities. In our ongoing research projects, we are in part developing a TDM (in the more restricted sense) system to discover unknown gene-disease associations through an analysis of the biomedical literature [2]. The system is based on an extension of an information retrieval model and is capable of predicting causative genes for a given heredity disease. Independently, we are also working on automatic annotation of gene functions represented by Gene Ontology (GO) terms based on textual description extracted from journal articles [4], which was partly motivated by the TREC Genomics Track 2004 and the first BioCreative workshop. The annotation framework consists of flexible gene name matching and text categorization modules and is currently regarded as the state-of-the-art. This latter work can be considered the broader sense of TDM, attempting to abstract semantic knowledge embedded in natural language text. Although the aforementioned projects, gene-disease association discovery and automatic GO annotation, are characterized by different aspects of TDM, they are closely related and may work complementarily since genes are the critical entities in both frameworks. The next sections describe some details of these two projects and discuss how they could be integrated to help each other.
منابع مشابه
A Semantic Approach for Mining Hidden Links from Complementary and Non-interactive Biomedical Literature
Two complementary and non-interactive literature sets of articles, when they are considered together, can reveal useful information of scientific interest not apparent in either of the two sets alone. Swanson called the existence of such hidden links as undiscovered public knowledge (UPK). The novel connection between Raynaud disease and fish oils was uncovered from complementary and non-intera...
متن کاملCorporate Social Responsibility Reports: Understanding Topics via Text Mining
This study utilizes Text Data Mining (TDM) to analyze the contents of Corporate Social Responsibility (CSR) Reports. The goal is to find evidence that environmental sustainability has become embedded in corporate policy and the core business discourse of seven organizations over 2004-2012. Results from supervised modeling techniques suggest embeddedness of environmental qualities in the busines...
متن کاملTDM to IP Migration: Your Network, Your Timeline
Network technology based on the Internet Protocol has now advanced to the point that data traffic of any kind can be as reliably transported via Ethernet as it can via TDM. This paper explores the strengths and weaknesses of both TDM and Ethernet transport within the context of microwave radio communications. The paper further describes the operational advantages of Ethernet and Ethernet radio ...
متن کاملMacrophage activation by cord factor (trehalose 6,6'-dimycolate): enhanced association with and intracellular killing of Trypanosoma cruzi.
Cord factor (trehalose 6,6'-dimycolate[TDM] ), a mixture of 6,6'-diesters of alpha, alpha-D-trehalose with natural mycolic acids, has been described as having immunoregulatory and antitumor activities in vivo, although the relevant mechanisms of action remain unelucidated. In this work, we measured the effects of TDM on both mouse macrophage association with (i.e., the combined result of surfac...
متن کاملTCMGeneDIT: a database for associated traditional Chinese medicine, gene and disease information using text mining
BACKGROUND Traditional Chinese Medicine (TCM), a complementary and alternative medical system in Western countries, has been used to treat various diseases over thousands of years in East Asian countries. In recent years, many herbal medicines were found to exhibit a variety of effects through regulating a wide range of gene expressions or protein activities. As available TCM data continue to a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008